234 PART 5 Looking for Relationships with Correlation and Regression
Understanding the Basics of
Multiple Regression
In Chapter 16, we outline the derivation of the formulas for determining the
parameters of a straight line so that the line — defined by an intercept at the Y
axis and a slope — comes as close as possible to all the data points (imagine a
scatter plot). The term as close as possible is operationalized as a least-squares line,
meaning we are looking for the line where the sum of the squares (SSQ) of vertical
distances of each point from to the line is the smallest. SSQ for a fitted line is
smallest for the least-squares line than for any other line you could possibly draw.
The same idea can be extended to multiple regression models containing more
than one predictor (which estimates more than two parameters). For two predic-
tor variables, you’re fitting a plane, which is a flat sheet. Imagine fitting a set of
points to this plane in three dimensions (meaning you’d be adding a Z axis to your
X and Y). Now, extend your imagination. For more than two predictors, in regres-
sion, you’re fitting a hyperplane to points in four-or-more-dimensional space.
Hyperplanes in multidimensional space may sound mind-blowing, but luckily for
us, the actual formulas are simple algebraic extensions of the straight-line
formulas.
In the following sections, we define some basic terms related to multiple regres-
sion, and explain when you should use it.
Defining a few important terms
Multiple regression is formally known as the ordinary multiple linear regression
model. What a mouthful! Here’s what the terms mean:»
» Ordinary: The outcome variable is a continuous numerical variable whose
random fluctuations are normally distributed (see Chapter 24 for more about
normal distributions).»
» Multiple: The model has more than two predictor variables.»
» Linear: Each predictor variable is multiplied by a parameter, and these
products are added together to estimate the predicted value of the outcome
variable. You can also have one more parameter thrown in that isn’t multi-
plied by anything — it’s called the constant term or the Intercept. The following
are examples of linear functions used in regression: